Modelling text prediction systems in low- and high-inflected languages

نویسندگان

  • Nestor Garay-Vitoria
  • Julio Abascal
چکیده

Text prediction was initially proposed to help people with a low text composition speed to enhance their message composition. After the important advancements obtained in the last years, text prediction methods may nowadays benefit anyone trying to input text messages or commands, if they are adequately integrated within the user interface of the application. Diverse text prediction methods are based in different statistic and linguistic properties of natural languages. Hence, they are very dependent on the language concerned. In order to discuss general issues of text prediction it is necessary to propose abstract descriptions of the methods used. In this paper a number of models applied to text prediction are presented. Some of them are oriented to low-inflected languages while others are for high-inflected languages. All these models have been implemented and their results are compared. Presented models may be useful for future discussion. Finally, some comments related to the comparison of previously published results are also done.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of Prediction Methods Applied to an Inflected Language

Prediction is one of the techniques that has been applied to Augmentative and Alternative Communication to help people enhance the quality and quantity of the composed text in a time unit. Most of the literature has focused on word prediction methods that may easily be applied to non-inflected languages. However, for inflected languages, other approaches that mainly distinguish roots and suffix...

متن کامل

Word Prediction For Inflected Languages: Application To Basque Language

Several word prediction methods to help the communication of people with disabilities can be found in the recent literature. Most Of them have been developed for English or other non-inflected languages. While most of these methods can be modified to be used in other languages with similar structures, they may not be directly adapted to inflected languages. In this paper some word prediction te...

متن کامل

Comparison of language modelling techniques for Russian and English

In this paper the main differences between language modelling of Russian and English are examined. A Russian corpus and a comparable English corpus are described. The effects of high inflectionality in Russian and the relationship between the outof-vocabulary rate and vocabulary size are investigated. Standard word and class N -gram language modelling techniques are applied to the two corpora a...

متن کامل

Data-driven Amharic-English Bilingual Lexicon Acquisition

This paper describes a simple approach of statistical language modelling for bilingual lexicon acquisition from Amharic-English parallel corpora. The goal is to induce a seed translation lexicon from sentence-aligned corpora. The seed translation lexicon contains matches of Amharic lexemes to weekly inflected English words. Purely statistical measures of term distribution are used as the basis ...

متن کامل

Automatic Construction of Morphologically Motivated Translation Models for Highly Inflected, Low-Resource Languages

Statistical Machine Translation (SMT) of highly inflected, low-resource languages suffers from the problem of low bitext availability, which is exacerbated by large inflectional paradigms. When translating into English, rich source inflections have a high chance of being poorly estimated or out-of-vocabulary (OOV). We present a source language-agnostic system for automatically constructing phra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer Speech & Language

دوره 24  شماره 

صفحات  -

تاریخ انتشار 2010